97 research outputs found
Asynchronous Execution of Python Code on Task Based Runtime Systems
Despite advancements in the areas of parallel and distributed computing, the
complexity of programming on High Performance Computing (HPC) resources has
deterred many domain experts, especially in the areas of machine learning and
artificial intelligence (AI), from utilizing performance benefits of such
systems. Researchers and scientists favor high-productivity languages to avoid
the inconvenience of programming in low-level languages and costs of acquiring
the necessary skills required for programming at this level. In recent years,
Python, with the support of linear algebra libraries like NumPy, has gained
popularity despite facing limitations which prevent this code from distributed
runs. Here we present a solution which maintains both high level programming
abstractions as well as parallel and distributed efficiency. Phylanx, is an
asynchronous array processing toolkit which transforms Python and NumPy
operations into code which can be executed in parallel on HPC resources by
mapping Python and NumPy functions and variables into a dependency tree
executed by HPX, a general purpose, parallel, task-based runtime system written
in C++. Phylanx additionally provides introspection and visualization
capabilities for debugging and performance analysis. We have tested the
foundations of our approach by comparing our implementation of widely used
machine learning algorithms to accepted NumPy standards
Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku
The increasing availability of machines relying on non-GPU architectures,
such as ARM A64FX in high-performance computing, provides a set of interesting
challenges to application developers. In addition to requiring code portability
across different parallelization schemes, programs targeting these
architectures have to be highly adaptable in terms of compute kernel sizes to
accommodate different execution characteristics for various heterogeneous
workloads. In this paper, we demonstrate an approach to code and performance
portability that is based entirely on established standards in the industry. In
addition to applying Kokkos as an abstraction over the execution of compute
kernels on different heterogeneous execution environments, we show that the use
of standard C++ constructs as exposed by the HPX runtime system enables superb
portability in terms of code and performance based on the real-world Octo-Tiger
astrophysics application. We report our experience with porting Octo-Tiger to
the ARM A64FX architecture provided by Stony Brook's Ookami and Riken's
Supercomputer Fugaku and compare the resulting performance with that achieved
on well established GPU-oriented HPC machines such as ORNL's Summit, NERSC's
Perlmutter and CSCS's Piz Daint systems. Octo-Tiger scaled well on
Supercomputer Fugaku without any major code changes due to the abstraction
levels provided by HPX and Kokkos. Adding vectorization support for ARM's SVE
to Octo-Tiger was trivial thanks to using standard C+
From Piz Daint to the Stars: Simulation of Stellar Mergers using High-Level Abstractions
We study the simulation of stellar mergers, which requires complex
simulations with high computational demands. We have developed Octo-Tiger, a
finite volume grid-based hydrodynamics simulation code with Adaptive Mesh
Refinement which is unique in conserving both linear and angular momentum to
machine precision. To face the challenge of increasingly complex, diverse, and
heterogeneous HPC systems, Octo-Tiger relies on high-level programming
abstractions.
We use HPX with its futurization capabilities to ensure scalability both
between nodes and within, and present first results replacing MPI with
libfabric achieving up to a 2.8x speedup. We extend Octo-Tiger to heterogeneous
GPU-accelerated supercomputers, demonstrating node-level performance and
portability. We show scalability up to full system runs on Piz Daint. For the
scenario's maximum resolution, the compute-critical parts (hydrodynamics and
gravity) achieve 68.1% parallel efficiency at 2048 nodes.Comment: Accepted at SC1
Methylation of H3-Lysine 79 Is Mediated by a New Family of HMTases without a SET Domain
AbstractThe N-terminal tails of core histones are subjected to multiple covalent modifications, including acetylation, methylation, and phosphorylation [1]. Similar to acetylation, histone methylation has emerged as an important player in regulating chromatin dynamics and gene activity [2–4]. Histone methylation occurs on arginine and lysine residues and is catalyzed by two families of proteins, the protein arginine methyltransferase family and the SET-domain-containing methyltransferase family [3]. Here, we report that lysine 79 (K79) of H3, located in the globular domain, can be methylated. K79 methylation occurs in a variety of organisms ranging from yeast to human. In budding yeast, K79 methylation is mediated by the silencing protein DOT1. Consistent with conservation of K79 methylation, DOT1 homologs can be found in a variety of eukaryotic organisms. We identified a human DOT1-like (DOT1L) protein and demonstrated that this protein possesses intrinsic H3-K79-specific histone methyltransferase (HMTase) activity in vitro and in vivo. Furthermore, we found that K79 methylation level is regulated throughout the cell cycle. Thus, our studies reveal a new methylation site and define a novel family of histone lysine methyltransferase
- …